\chapter{\label{chp:mesh}Mesh-based Monte Carlo Method in Time-domain Widefield Fluorescence Molecular Tomography}
In all previous chapters the Monte Carlo method were performed in volxel-based geometry, and we have demonstrated that the voxel-based Monte Carlo (vMC) method allows computation of functional and fluorescent time-gated Jacobians for whole-body tomography within a couple of minutes per source-detector pair. However, difficulty arises when applying the vMC algorithm, as the commonly employed cubic shape voxels in regular 3D grid cannot accurately simulate the curved boundaries. One remedy is to raise the voxel density, but doing so drastically increases the memory and computational burden. This burden is further amplified when using widefield illumination strategies, where photons interact with a much larger boundary area compared to the conventional punctual light source scheme. Therefore, a Monte Carlo algorithm that has the flexibility of accurately representing the arbitrary domain shape is highly desired\let\thefootnote\relax\footnotetext{
Portions of this chapter will appear in:
\begin{itemize}
 \item[] J. Chen, Q. Fang and X. Intes, ``Mesh-based Monte Carlo Method in time-domain widefield fluorescence molecular
  tomography,'' \textit{J. Biomed. Opt.}, submitted for publication.
 \item[] M. Pimpalkhare \textit{et al}, ``Ex vivo fluorescence molecular tomography of the spine,'' \textit{Int. J. Biomed. Imaging}, submitted for publication.
\end{itemize}}.

In this chapter, we present an optimized software platform for FMT with spatially extended arbitrary sources in the presence of complex boundary conditions. The novelty of this approach lies in the accurate boundary representation and rapid calculation when employing the mesh-based Monte Carlo (mMC) method \cite{fang2010}. Based on a currently available mMC code \cite{mmc}, an extended hybrid parallel version is implemented to model arbitrary illumination patterns. A comparison is performed between this method and our previously developed vMC code in terms of quantitative accuracy and computational efficiency for whole-body tomographic reconstruction in time-domain. This method is then validated and evaluated by small animal full-body simulations and experiments in tomographic settings.

\section{\label{sec:meth_mmc}Forward widefield mesh-based Monte Carlo method}
The mMC method utilizes fast ray-tracing techniques to accelerate calculation and is capable of simulating point illumination on complex geometry \cite{fang2010}. Here the widefield illumination is developed based on its version 0.8 release as provided at \cite{mmc}. In order to simulate a realistic widefield source with spatial intensity variance, the illumination patterns are represented as 2D-grid (1mm$^2$ pixel) images with an intensity value assigned at each grid element (Fig.~\ref{fig:flow}A). The initial positions of the total photons are uniformly distributed over the illumination image by using the uniform random number generator. Any photon falling into one pixel has the pixel's intensity value assigned as its initial packet weight, allowing for arbitrary illumination settings (Fig.~\ref{fig:flow}B). A simulated photon is then projected along the z-axis and enters the mesh at an intersection point on the mesh surface. To identify the surface injection point, we apply a ray-tracing step to obtain its 3D coordinates by testing all surface triangles using a Havel-Herout ray-tracing algorithm~\cite{Havel2010} (Figs.~\ref{fig:flow}b and \ref{fig:flow}c). In case multiple intersection points are identified, the intersection with the shortest distance to the initial position (Fig.~\ref{fig:flow}b) is selected as the injection point on the surface. After a photon's injection point is determined, its propagation in the media is simulated following the same rules as in classical MC techniques, until it exits the surface or the total time of interest is reached.
\begin{figure}[htbp]
\includegraphics{widefmmc-1}
\caption{\label{fig:flow}The illustration for generating the widefield illumination using a digital mouse phantom. A: arbitrary illumination pattern; B: photons projected to the surface; C: surface region of interest; a: initial position of a photon; b: the actual injection point of this photon; c: a potential injection point of this photon (discarded since the distance to the initial position is longer than b).}
\end{figure}

Unfortunately, complete surface triangle test of this additional ray-tracing step imposes a significant computational overhead, particularly when the surface mesh is very dense. For example, the calculation time for the ray-triangle intersection test can be 50x more than that for the photon propagation on a mouse model. In order to accelerate the computational time for the widefield method to match punctual illumination method, a few algorithmic optimizations are implemented, particularly considering the fact that the x-y coordinate of the injection position is dependent on the photon's initial position, while the z coordinate is dependent on the intersection of the photon and the surface. First, we confine the initial positions to the region of interest~(ROI), and the surface triangles in the ROI are picked out (the dark gray surface in Fig.~\ref{fig:flow}C). Second, using the projected triangles on the x-y plane, a subset of triangles is selected and stored in memory. For any launched photon, a culling technique \cite{Wald2004} is then implemented to confine ray-triangle tests only to the triangles with their bounding boxes containing the photon's x-y coordinate. These operations eliminate redundant calculations thus allow for acceleration in computation time.


%\section{Generating the Jacobians}
\section{Parallelization and computational efficiency evaluation}
\label{sec:sim_setting}
\noindent
For forward simulations, we incorporate the widefield related computation in the mMC software platform. In addition to the previously implemented multi-threading and single-instruction multiple-data (SIMD) parallelism, we further enhance the modeling efficiency by integrating Message Passing Interface (MPI) based parallel computing technique with the code. This allows one to run mMC in a distributed memory system, such as a high-performance cluster. To utilize the maximum computational power of platforms with both multi-threads and multi-nodes, an adaptive hybrid parallelization method using the communication protocol MPI and OpenMP APIs is implemented. The MC approach is highly parallelizable, that is, the large number of photons can be broken up into smaller sets and calculated independently. In terms of distribution, the MPI protocol has photons equally divided and distributed to all nodes, while the OpenMP protocols dynamically set the assignment of photons for the threads on a node. The seed for the random number generator is assigned based on the thread and node ID. A reduce operation is performed to sum up the result to a single node (master node) after all nodes finish computing.

In this particular work, the computational efficiency of the hybrid parallel code was tested using Single/Dual Core CPUs (BlueGene/Opteron on CCNI, RPI). To estimate the scalability of multi-threading and multi-CPU calculation, quantitative metrics were computed to measure the performance of the parallel programs. We defined the speedup as the ratio of the execution time on a single processor (the sequential version) to that on a multi-node cluster:
\begin{equation}
\label{eq:speedup} S(n) = \frac{t_s}{t_p(n)},
\end{equation}
where $n$ is the number of threads/processes used in the parallel simulation, $t_s$ is the execution time on a single-thread processor and $t_p$ is the execution time on a parallel computer. $S(n)$ therefore describes the scalability of the system as the number of threads/processes increases. The efficiency was computed as:
\begin{equation}
\label{eq:effi} E(n)=\frac{t_s}{n\times t_p(n)},
\end{equation}
which describes the fraction of time that is being used by the processors for the computation.

Our implementation allows the generation of spatially complex illumination patterns over arbitrary oriented surfaces to accurately model non-contact widefield illumination strategies. After the optimizations mentioned in Section~\ref{sec:meth_mmc}, the overall computational overhead of simulating extended time-resolved sources in this complex scenario was only marginally larger (~5-10\%) over a point source configuration, while providing much greater functionality.

Fig.~\ref{fig:speedup}a shows the speedup of mMC calculation while using different threading settings with $10^7$ and $10^8$ photons. For simulations using a larger number of photons, the speedup curve has a marked improvement toward the ideal curve, implying higher efficiency ($E(16)=93.2\%$ for $10^7$ photons and $E(16)=99.3\%$ for $10^8$ photons with single-thread computation setting). The difference between the three threading methods is minimal when the same number of photons is employed ($<3\%$), with pure threading being the closest to ideal. However, the current generation of multi-core hardware has a limitation on total number of threads thus limits the speedup potential of pure threading technique.

Fig.~\ref{fig:speedup}b depicts the time cost for mMC and vMC using different number of processes under single-thread parallel settings while having similar numbers of elements/valid voxels. Under all available processes settings (128-4,096), mMC remains roughly 4-5 times faster than vMC, while for large number of processes (4,096) the program approaches its scaling bottleneck.
\begin{figure}[htbp]
 \centering
 \includegraphics{widefmmc-4}
 \caption{\label{fig:speedup}(a) The speedup using widefield mMC with $10^7$ and $18^8$ photons on single-thread nodes, 4-thread nodes and an 8-thread node; (b) the time cost for mMC and vMC on single-tread nodes.}
 \end{figure}

\section{Accuracy comparison}
\label{sec:meth_accuracy}
\noindent Due to a wider coverage range of the illumination, the simulation result of widefield illumination can be more sensitive to errors of boundary modeling than conventional point illumination used in optical tomography. To quantitatively assess the performance of mMC and vMC dealing with different types of illumination on complex models, a simulation using high resolution voxel-based model was first set up as the reference. The light propagation profiles using mesh-based model and re-sampled low resolution voxel-based model with the same number of tetrahedral elements/valid voxels~(voxels that are not air) were then computed.

To mimic a pre-clinical tomographic reconstruction, the high-resolution Digimouse model (0.1mm$^3$) generated from CT data~\cite{Dogdas2007} was employed to create the mesh model and low resolution voxelized model~(Fig.~\ref{fig:accuracy_set}). The software package Iso2mesh \cite{Fang2009a} was used to convert the volumetric model to a mesh model containing 4,075 node, 22,379 tetrahedron elements. The corresponding low-resolution voxel-based model had a comparable number of 22,433 valid voxels resulting in 1mm$^3$ voxels, which is the typical voxel size in preclinical applications. Notice here the mesh model was homogeneously tesselated with no denser sampling around the boundary or internal organs to assure an impartial comparison with the homogeneously voxelized model. The illumination pattern covered a 5-7cm area with full coverage of the transverse plane. A cross-sectional plot is shown in Fig.~\ref{fig:accuracy_set}b for mesh- and voxel-based model, respectively. In all simulations above, the numbers of photons were kept at $10^8$ according to Chapter \ref{chp:comp}. The optical properties were set to $\mu_a = 0.3$cm$^{-1}$, $\mu_s = 15$cm$^{-1}, g = 0.9, n= 1.37$, which are typical values for mouse tissue in the NIR spectral region according to literature \cite{Alexandrakis2005}. The time profiles were recorded with a 300ps gate width and 20ps time shift between the gates to replicate our imaging platform for optimal experimental performance.
\begin{figure}[htbp]
\centering
\includegraphics[width=\textwidth]{widefmmc-2}
\caption{\label{fig:accuracy_set}(a) The high resolution mouse atlas employed in the accuracy comparison. (b) The selected slice in (a) for boundary visualization overlapped with the mesh model and the low resolution voxel model. The red line shows the boundary of the high resolution CT image, while the black and blue lines represent that of the mesh- and low resolution voxel model, respectively.}
\end{figure}

The time-gated profiles of light propagation using the mesh- and high-resolution models were scaled down to the size of the low resolution profile, then the percentage errors of mesh- and low-resolution models in relation to the high-resolution model were calculated to assess the quantitative accuracy. Further evaluation of the methods was conducted by calculating the overall root mean square difference (RMS difference) between the results using the two methods and the high resolution standard:
\begin{equation}
\label{eq:speedup} RMS(t) = \sqrt{\frac{\sum_{i=1}^N(p(t)-p_0(t))^2}{\sum_{i=1}^N{p_0(t)^2}}},
\end{equation}
where $N$ is the number of voxels in the low-resolution voxel model, $t$ is the time, $p_0$ is the high resolution result and $p$ is that calculated by low resolution mMC or vMC.

\subsection{Forward model accuracy}
Fig.~\ref{fig:accu_forward}a and \ref{fig:accu_forward}b shows the forward photon propagation profiles at $y = 60$mm for the full field and point illumination, respectively. As expected, mMC fits the boundaries much better compared to vMC for widefield case, while both mMC and vMC matches well to the standard for point source case. Fig.~\ref{fig:accu_forward}c presents the TPSFs for these two illumination settings at the same detector indicated in Fig.~\ref{fig:accu_forward}a and \ref{fig:accu_forward}b. Both the results using vMC and mMC methods fit the standard well for the point illumination, while the TPSFs for widefield illumination result in increased error for vMC compared to mMC, especially around the rising part (early gates) of the TPSF. A comparison of the different datatype extracted from the time-resolved measurements at these two detectors are listed in table~\ref{tab:tpsf}, with the Continuous Wave (CW) measurement being a percentage error and the other parameters as absolute errors in time. The error in the $1^{st}$ moment of TPSF (CW measurements) reflects the integrated intensity difference and that in the $2^{nd}$ and $3^{rd}$ moments (mean flight time and variance) represent a change in position and shape of the TPSF. Overall, the mMC results are more accurate than the vMC ones, while the inaccuracy at the side detector (detector 1) is greater than that at the center detector (detector 2), due to the fact that the side detector is closer to the edge thus more affected by the mismatched boundary area. Comparing the widefield and point illumination, the error is greater in the widefield case as expected, as a result of an illumination pattern probing larger boundary area where mismatch boundaries exist.

\begin{table}[!h]
%    \tabcolsep 0pt
       \caption{Error table comparing the low resolution vMC and mMC with the standard reference. The CW measurement error is in percentage, and the others are in absolute time. Widef.: widefield; MFT: mean flight time; Var.: variance. }
  \label{tab:tpsf}
  \vspace*{-12pt}
  \centering
  \def\temptablewidth{1.0\textwidth}
  {\rule{\temptablewidth}{1pt}}
  \begin{tabular}{rccccccccc}
   &&\multicolumn{2}{c}{CW(\%)}&\multicolumn{2}{c}{Tmax (ps)}&\multicolumn{2}{c}{MFT (ps)}&\multicolumn{2}{c}{Var. (ps)}\\
  & &Det 1 & Det 2 &Det 1& Det 2 &Det 1& Det 2 &Det 1& Det 2\\
  \hline
   \multirow{2}{*}{Widef.} &mMC&-1.29 &-0.81&40&20&20.81&15.94&-3.93&-2.19\\
  & vMC& -3.35&1.99&80&0&67.83&18.81& -8.51&-3.00\\
  \hline
   \multirow{2}{*}{Point} &mMC& -1.62&-0.38&0&20& -6.96&20.93&1.55&-3.43\\
  & vMC& 2.48&0.73&0&20&21.12 &18.81 &-3.30&-3.00\\
    \end{tabular}
  {\rule{\temptablewidth}{1pt}}
\end{table}
\begin{figure}[htbp]
 \centering
 \includegraphics[width=.9\textwidth]{widefmmc-5}
 \caption{\label{fig:accu_forward}Forward simulation comparison for (a) widefield and (b) point source illumination. mMC:blue, vMC:grey; high resolution standard (std): solid color filled contours~(background). (c) The normalized time-resolved detector readings (black crosses and circles in a and b).}
\end{figure}

\subsection{Accuracy in Jacobians}
Based on the comparison between different MC-based reconstruction methods in Chapter \ref{chp:comp}, we selected the forward-adjoint MC method to generate time-gated Jacobians with a computational efficient manner, thanks to the smaller dataset acquired with widefield illumination strategies (Chapter \ref{chp:pmc} and \ref{chp:fluo}).

The contours of continuous wave Jacobians using both the mMC and vMC methods are shown in Fig.~\ref{fig:accu_result}a and \ref{fig:accu_result}b for widefield and point illumination, respectively. In Fig.~\ref{fig:accu_result}c and \ref{fig:accu_result}d, the Jacobian contour comparison at the 25\% rising gate is displayed. In both cases, contours using the mMC and vMC methods match the reference contour well for the point-point SD pair. However, the contour using the mMC method fits the reference much better than that using the vMC method in a widefield-point SD setting, especially around the boundaries where the voxel model experiences increased mismatch. An average RMS difference of 10.8\% and 9.4\% was achieved in the point-point SD pair Jacobians using vMC and mMC, respectively, while the RMS difference was 23.5\% and 13.4\% for the widefield-point SD pair. These values may be compared to a baseline RMS difference of 2.4\% and 3.8\%, obtained using two high resolution simulations with $10^9$ photons and different random seeds. The Jacobian accuracy comparison in time-domain is shown in Fig.~\ref{fig:accu_result}c. For the point-point SD pair, the RMS differences using the two methods do not experience significant change with respect to the changes in gate position. However, for the widefield case, the mMC has a considerably lower RMS difference compared to the vMC for early gates.
\begin{figure}[htbp]
 \centering
 \includegraphics[width=\textwidth]{widefmmc-6}
 \caption{\label{fig:accu_result}The contour comparison of Jacobians using mMC and vMC for (a, c) point and (b, d) widefield illumination in (a,b) continuous wave and (c, d) time-gated modes. The contours filled with solid colors are from the high resolution simulation. (e) The RMS comparison in time-domain for four cases and all time gates.}
\end{figure}

\section{\textit{in silico} investigation}
\subsection{Mouse model reconstruction}
The above mentioned models were utilized in a full tomographic reconstruction study. The high-resolution mouse model was employed to generate fluorescence measurements with the pre-segmented kidneys labeled with simulated fluorescent markers (effective quantum yield 0.1; $\tau$ = 0.8ns). 64 sliding half-space illumination patterns and 64 point detectors were simulated within the abdomen section (x: 50-70mm, y: 8.5-28.5mm, 400 pixels for each pattern) using 512 single-thread nodes on BlueGene. The detectors were evenly spanned over the region of interest with a 2.857mm separation (Fig.~\ref{fig:widefmmc_sim}a). A 25\% rising gate (early gate) was selected and the Born normalization \cite{Ntziachristos2001} of the corresponding time point spread functions (TPSFs) was employed as the datatype for reconstruction herein. The conjugate gradient method was employed to solve the inverse problem and the iteration is stopped when the relative error is less than 0.02.
\begin{figure}[htbp]
 \centering
 \includegraphics{widefmmc-3}
 \caption{\label{fig:widefmmc_sim}(a) Mesh-based mouse model with kidneys depicted in red. The patch of black and red shows one sample scanning pattern out of a total of 64. The black areas indicate where the photon source is off, while the red area shows where it is on. The black circles at the bottom represent the detectors. The simulation reconstruction result. (b) The 50\% isovolume of the reconstruction. (c) 5 slices from $x =$50mm to $x = 70$mm with a separation of 5mm.}
 \end{figure}

The reconstruction using mesh-based MC methods is shown in Figs.~\ref{fig:widefmmc_sim}b and \ref{fig:widefmmc_sim}c. The kidneys are accurately reconstructed in regard to their original discretization. The maximum reconstructed value for the mesh-based MC was 91.0\% of the expected value. The centroids of the two kidneys had position error of 1.02mm and 0.78mm, respectively. The run time for creating the time-domain mesh-based Jacobian was 3.47h for $64\times64$ SD pairs (in total 4,096) on 512 nodes. The maximum reconstructed value for the mMC method was 3.4\% more accurate toward the expected value compared to the vMC method.

\subsection{\textit{ex vivo} spine imaging}
Even though the NIR window exhibit the lowest light attenuation overall, the use of FMT for preclinical studies has been limited to small animals, especially murine models. Use of FMT for larger animal models is precluded due to relative high-attenuation of the tissue and larger volumes to be imaged. However, while mice provide proof of principle and allow testing of a variety of therapeutic modalities, mouse models have some limitations, such as only capable of performing short-term experiments, a homogenous genetic background that is unlike humans, and the knockout models do not always faithfully represent the human disease. Naturally occurring large animal models of human diseases have become increasingly important, due to higher degree of similarities compared to human diseases, despite the cost and the extensive clinical attention they require. Larger animals also provide advantages such as being reasonably outbred, long-lived allowing for longitudinal studies, are more similar in size to a neonate or small child providing an opportunity to address issues related to scaling up therapy, and many physiological parameters including the immune system are more similar to those in humans versus those in mice. For such animal models, however, it is unlikely that whole-body FMT in transmission will ever be achieved. Though, FMT can still play a critical role in molecular imaging studies of these preclinical models by offering the opportunity to image non-destructively \emph{ex vivo} harvested samples.

In this section, we demonstrate the potential of mMC for such novel application of FMT. We evaluate the performance of mMC-based FMT to image a dog spine model based on real specimen, which has complex small boundaries and broad span of optical properties. More precisely, we focus on imaging the intervertebral disk with the long term goal of applying this new method in orthopedic research in investigating disk degeneration's role in lower back pain.

\subsubsection{Large animal spine model}
%Models of disc degeneration are mainly used to address scientific questions relating to disease progression and treatment. Many studies in pre-clinical settings on disc degeneration have been done on species such as rabbits and dogs, due to the comparative ease of access versus human tissue. Rabbit and dog spines have been used in diagnostic radiology and ultrasound to evaluate the performance of these modalities and to further interpret their results. Herein, we focus on the first foray into establishing the potential of the FMT on an ex vivo spine imaging.

Figure \ref{fig:spine-model} shows the specific dog spine specimen used in this study. The muscles and soft tissues around the vertebrae have been removed to give it an approximately cylindrical geometry.
\begin{figure}[htbp]
 \centering
 \includegraphics{spine-1}
 \caption{\label{fig:spine-model}(a) The dog spine specimen used in this study, with overall dimensions approximately 4.5cm by 2cm by 2cm. (b) The mesh model with segmented regions. The green mesh shows the simulated fluorophore for the fluorescence signals. The spinal cord, disk and the fluorophore are shown in full 3D volume, and the bone mesh is for $y >30$ to show the crosssection of the internal tessellation.}
 \end{figure}

The geometry of the spine specimen was first captured by MRI to derive high-resolution anatomical templates. The spine was placed in a detachable holder and emersed into fomblin during the MRI process. The resulting MRI images cover a cubic volume containing 1 intervertebral disc and part of the 2 adjacent vertebrae, with 175$^3$ voxels and a resolution of 0.24mm per voxel. This volumetric model was then segmented into three non-overlapping regions: bone, spinal cord and intervertebral disc, using the ITK-SNAP package. These three regions were considered to be the main optical region of interest with differing optical properties in this application.

The segmented high resolution model cannot be directly used in FMT due to the associated large inverse problem size. To be able to perform computationally efficient FMT while retaining accurate boundary presentation, a 3D mesh was created from the segmented data employing a modified version of MeshSim (Simmetrix Inc., NY, USA) to incorporate the label information. The full 3D mesh contains 469,415 elements, with a denser tessellation around the external and internal boundaries. This number was chosen as a empirical golden mean between processing time and resolution. Fig. \ref{fig:spine-model}b shows the mesh model with contiguous segmented regions.

\subsubsection{Illumination-detection settings}
Due to the relatively rigid shape of the spine, the illumination-detection setting for this \textit{in silico} investigation was mimicking a multi-view transmittance imaging platform developed in our group. The multi-view configuration allows capturing of full tomographic information, commonly achieved with high angular sampling configuration. We used a specimen holder to rotate the spine around y-axis (along the spinal cord direction) in intervals of 90 degrees for a total of 4 acquisition views.

To replicate the experimental patterns in this platform, 12 half-space bar-shaped patterns sliding longitudinally over the rotating axis were simulated for each of the 4 orthogonal views, with a total of 48 patterns acquired. The CCD detection in transmittance was simulated by 156 fixed detectors (12 $\times$ 13 detector array) evenly spanned over the spine area with separation of 2.48mm. Optimally, this imaging protocol provided 7,488 source-detectors combination. A conceptual depiction of the pattern transmission over the spine is provided in Fig. \ref{fig:spine-sd}.
\begin{figure}[htbp]
 \centering
 \includegraphics{spine-2}
 \caption{\label{fig:spine-sd}Pattern illumination and detection settings for reconstructing the fluorophore distribution. The arrows show the projection directions of the patterns.}
 \end{figure}

\subsubsection{Reconstructing methods}
For a more robust reconstruction, the normalized Born formulation was employed. This method can reduce the impact of the heterogeneous background on the localization and quantification of the fluorochrome \cite{Soubret2005, Ntziachristos2001}, while also allows for less stringent calibration protocols as instrument dependent parameters are canceled out in the normalized formulation. However, the normalized Born approach may not reduce the effect of background optical properties on the fluorescence reconstructions completely. Hence, we employed Jacobians based on both homogenous and heterogeneous optical properties within the Born normalized formulation for tomographic reconstructions.

The optical properties listed in Table \ref{tab:spine-mu}) \cite{Cheong1990, Firbank1993} were employed for different segmented regions in the heterogeneous case, while the optical properties of the bone were used for all regions for the homogeneous case. In all the work herein, the measurements used in the inverse problem were based on the forward mMC simulation with heterogeneous optical properties, to ensure a realistic dataset. A fluorescent object with size 3mm by 3mm by 3mm, was placed in the intervertebral disk and used to generate the fluoresce signals, with effective quantum efficiency equal to 0.1 and lifetime 500ps for simulating ICG (Fig. \ref{fig:spine-model}b). Time-gated forward-adjoint method with $10^8$ photons for each source or detector was employed to generate the time-gated Jacobian, while the CW Jacobians were based on an integration of the light propagation over time. An example of the ratio of heterogeneous over homogeneous CW Jacobians is provided in Fig. \ref{fig:spine-Jac}, showing the difference in light propagation with heterogeneity perturbations. In the case of time-gated reconstruction, the 25\% rising gates were first used for improving resolution, then this result was set as the initial estimation of a late-gate reconstruction (50\% decaying) for quantification accuracy. All reconstructions herein were obtained using a least square method stopping after 100 iterations. No soft or hard priors were included in the optical inverse problem.
\begin{table}
\caption{\label{tab:spine-mu}{Optical properties assigned to different segmented regions of the spine.}}
\centering
\def\temptablewidth{0.6\textwidth}
{\rule{\temptablewidth}{1pt}}
\begin{tabular}{ccc}
 & $\mu_a$ (cm$^{-1}$) & $\mu_s$ (cm$^{-1}$) \\
\hline
Bone & 0.04& 12 \\
Spinal cord & 0.121& 8.9\\
Disk & 0.008& 5
\end{tabular}
{\rule{\temptablewidth}{1pt}}
\end{table}
\begin{figure}[htbp]
 \centering
 \includegraphics{spine-3}
 \caption{\label{fig:spine-Jac}(a) Homogeneous, (b) heterogenous and (c) the ratio of heterogenous and homogenous Jacobians for one source-detector pair in log scale.}
 \end{figure}

\subsubsection{Results}
The aim of the \textit{in silico} study was two-fold. First, the simulations were conducted to evaluate the performance of mMC on reconstructing fluorophore localized in a complex geometry under wide-field time-gated FMT settings. Second, to test if these simulations could be employed to guide and refine the FMT reconstruction and experimental imaging protocols. Parameters such as data type, number of views and impact of optical properties were investigated. The reconstructions fidelity was assessed in terms of localization and quantification.

Figure \ref{fig:spine-recon} provides the 50\% isovolume of CW and TG reconstructions using homogeneous and heterogeneous Jacobians. This comparison demonstrates the size of the reconstructed object using different reconstruction settings and clearly shows the improved fidelity in reconstructions when using time-gated heterogeneous Jacobians. It is worth noting that in all cases studied, the z dimension of the isovolume is larger than the $x$ and $y$ ones, implying a loss of resolution for z axis. This is potentially a result of the illumination-detection settings due to the sliding direction (along $y$) and the detection views (missing $x-y$ detectors).
\begin{figure}[htbp]
 \centering
 \includegraphics[width=\textwidth]{spine-4}
 \caption{\label{fig:spine-recon}Reconstruction using different settings.}
 \end{figure}

To evaluate the localization accuracy in recovering the simulated fluorescent inclusion, distances between the centroid position of the 50\% isovolume of the reconstructed object and the expected position were calculated.
%The resolution was evaluated by the volume of the 50\% isovolume, where larger volume implying lower resolution.
The simulation results were also evaluated based on their quantification performance, by comparing the expected value and the average reconstructed fluorescence value for the 50\% isovolume. Table \ref{tab:spine-error} summarizes the localization and quantification errors in the case of CW and TG data type for each independent view (90 degrees apart) and a combination of 4 views, using both homogeneous and heterogeneous Jacobians.

\begin{table}
\caption{\label{tab:spine-error}{Localization and quantification comparison for the fluorophore concentration reconstructions.}}
\centering
\def\temptablewidth{\textwidth}
{\rule{\temptablewidth}{1pt}}
\begin{tabular}{cccccccc}
&\multirow{3}{*}{View}& \multicolumn{2}{c}{Localiz. error (mm)}&\multicolumn{2}{c}{Vol. (mm$^3$)}&\multicolumn{2}{c}{Quantif. (\%)} \\
&& CW&TG&CW&TG&CW&TG \\
\hline
\multirow{4}{*}{Hete.}
  &1   &1.01& 0.91& 85.21 & 79.77 & 48.12& 52.12\\
  &2   &1.04& 1.02& 63.21 & 53.75 &72.35& 73.21\\
  &3   &1.26& 1.10& 105.22 & 89.67 &56.51& 49.77\\
  &4   &1.54& 1.47& 120.56 & 119.88 &38.75& 45.77\\
  &Comb. &0.78& 0.50& 89.75 & 37.53& 73.43& 89.36\\
\hline
\multirow{4}{*}{Homo.}
  &1   &1.62& 1.08& 102.33& 85.54 & 32.54& 44.84\\
  &2   &1.25& 1.79& 75.65& 67.64 & 51.15& 57.44\\
  &3   &1.47& 1.22& 119.42& 92.75 & 45.86& 39.76\\
  &4   &1.71& 1.60& 137.63& 112.38 & 34.91& 38.77\\
  &Comb. &1.28& 1.21& 93.65& 54.75 & 59.97& 72.64
\end{tabular}
{\rule{\temptablewidth}{1pt}}
\end{table}

%\begin{table}
%\caption{\label{tab:spine-error}{Localization and quantification comparison for reconstructions in the spine.}}
%\centering
%\def\temptablewidth{\textwidth}
%{\rule{\temptablewidth}{1pt}}
%\begin{tabular}{cccccccccc}
%&\multirow{3}{*}{View}& \multicolumn{6}{c}{Localiz. (mm)}&\multicolumn{2}{c}{Quantif. (\%)} \\
%&& \multicolumn{3}{c}{CW}&\multicolumn{3}{c}{TG}&CW&TG \\
%&&$x$&$y$&$z$&$x$&$y$&$z$&&\\
%\hline
%\multirow{4}{*}{Hete.}
%  &1   &0.35&0.37&0.87& 0.36&0.34&0.76& 50.32& 56.32\\
%  &2   &1.04&0.53&0.47& 0.78&0.48&0.44& 35.43& 36.05\\
%  &3   &0.45&0.36&1.03& 0.28&0.40&0.99& 76.11& 78.33\\
%  &4   &1.07&0.66&0.88& 1.04&0.68&0.79& 71.34& 72.45\\
%  &Comb. &0.12&0.21&0.32& 0.40&0.37&0.56& 78.43& 80.10\\
%\hline
%\multirow{4}{*}{Homo.}
%  &1   &0.62&0.50&1.41& 0.48&0.39&0.89& 39.30& 43.38\\
%  &2   &1.25&0.61&0.72& 1.07&0.54&0.66& 38.15& 39.44\\
%  &3   &0.67&0.46&1.23& 0.57&0.55&0.93& 44.08& 44.76\\
%  &4   &1.32&0.70&0.84& 1.25&0.67&0.74& 39.21& 41.87\\
%  &Comb. &0.68&0.55&0.93& 0.55&0.47&0.69& 48.87& 50.67
%\end{tabular}
%{\rule{\temptablewidth}{1pt}}
%\end{table}
%

Overall, in all cases investigated herein, the fluorophore was reconstructed in the disk with good localization accuracy. The fluorophore was always localized with less than 2mm error and with only 0.5mm inaccuracy in the best case scenario. As expected, multiple views gave better localization compared to the single views. When the views were combined, the localization errors were half of the smallest errors achieved by individual views in case of the heterogeneous model. In case of the homogeneous model, although an overall significant reduction was observed, the multi view localization error was matching the minimum error of the individual views. Also, as expected, time-gated FMT outperformed CW reconstructions in terms of localization. However, the gain in accuracy in using TG over CW is not commensurate with the gain of using multiple views, suggesting that the localization accuracy is more affected by the geometrical sampling strategy than the data type employed. Based on multiple views, accurate localization of the fluorophore can be achieved within the order of the mesh discretization scale (0.3mm). The localization errors reported in Table \ref{tab:spine-error} fall below 2\% of the actual dimension of spine along its smallest dimension for the best reconstruction scheme (~5\% for the worse case).

In terms of quantification, the different strategies investigated exhibited similar trends than the localization parameter. The best quantification was achieved with TG dataset and heterogeneous Jacobian (89.36\%) while the worst case was using CW data and homogenous Jacobian (59.97\%). Combining multiple views improved quantification. The main difference between localization and quantification trends lies with the data type. TG data significantly outperform CW data overall. To better understand this aspect, we provided the volume of the object reconstructed (as estimated with the 50\% iso-volume). As seen, the volume of the fluorescent heterogeneity is significantly smaller when using TG and multiple views. The contrast retrieved is then more focal and hence, better quantified. As described previously, the reconstruction scheme employed herein capitalizes on the early gate data to improve resolution. The incorporation of this data type reduced the volume of the object reconstructed by \~2. Even in the case of homogeneous Jacobian, the incorporation of the early gate led to reliable reconstructions. This indicates that multiple view and TG data type can be used for accurate FMT of the sample without generating an heterogenous model. This eliminates the necessity of acquiring high-resolution anatomical images which can be a tedious and costly step.

\section{Experimental validation}
% needs to be rewritten
\noindent In this section, the performance of the mMC method is evaluated with experimental data for FMT on an euthanized mouse. A showcase of the flexibility of mMC is also presented by applying the code to reconstruction of optical heterogeneity of a compressed breast model using punctual illumination and detection in CW mode.

\subsection{Preclinical model}
We used the same dataset as appeared in Section \ref{sec:fluo_mouse_exp} for the reconstruction test. The surface geometry of the region of interest ($40$mm$\times$31$mm$) from the CT images was extracted to generate a homogeneously tesselated model with 92,713 elements and 15,581 nodes for the mesh-based weight matrix calculation. The entire model was assigned the average background properties of the small animal acquired by spectroscopic fitting of the excitation measurements ($\mu_a = 0.3$cm$^{-1}$ and $\mu_s' = 25$cm$^{-1}$). The results of the forward simulations were stored in both node-based format (photon package weights were accumulated at each node) and element-based format (the result was saved for each element) to calculate the weight matrix. The number of nodes was only 16.8\% of elements thus the node-based format had advantages in both storage space and speed of data processing. The reconstructions employing weight matrices using the two formats were compared, in order to assess the possible loss of accuracy from using node-based format. The early-gate at 25\% of the maximum value was selected in reconstruction similarly to the synthetic study. For comparison, a reconstruction employing the same measurement dataset on a voxel-based model with 92,831 0.5mm$^3$ non-air voxels was performed.

Fig.~\ref{fig:exp_result}a shows the 50\% isovolume of the reconstructed effective quantum yield using the mMC node-based results. This isovolume has a maximum length of 11.3mm and mean diameter of 2.7mm. Moreover, sample slices of cross-section overlaid on the corresponding CT slices demonstrate accurate localization of the inclusion (Figs.~\ref{fig:exp_result}b-d). The element-based (92,713 elements) and node-based (15,581 nodes) reconstructions are  shown for comparison in Fig.~\ref{fig:exp_result}b and ~\ref{fig:exp_result}c, respectively. The node-based and element-based weight matrices resulted in very close reconstructions in both quantification ($<2$\% difference in maximum reconstructed value) and position of inclusion. The centroid position error of the 50\% isovolume using vMC was 2.83mm while that using mMC was 0.56mm.
\begin{figure}[htbp]
 \centering
 \includegraphics[width=1.01\textwidth]{widefmmc-8}
 \caption{\label{fig:exp_result}(a) 3D reconstruction showing the 50\% isovolume of the reconstructed object (blue) and the fluorescent vessel (red). Axial and frontal slices at $z =$6.5mm and $y =$ 21.5mm for (b) node-based, (c) element-based and (d) voxel-based reconstructions, respectively.}
 \end{figure}

\subsection{Optical mammography}
% sample preparation and data acquisition
The second test was focused on optical mammography to demonstrate the potential of mMC in thick tissue imaging. An optical breast scan was provided by University of Pennsylvania \cite{Busch2011}. The subject was a patient with biopsy-confirmed invasive ductal carcinoma. The optical dataset was acquired simultaneously with the multimodel GenIIm system for contrast enhanced DOT and MRI examination, and the MR images were co-registered \textit{ex post facto} with the optical images using fiducial markers attached to the fibers. The baseline optical and MR images were first obtained, then ICG and Gd-DTPA were sequentially injected into the patient. A CCD camera was employed for measuring the CW signals from the detection fibers (30s per frame). The geometry and fiber alignment are shown in Fig. \ref{fig:breast_set}a and \ref{fig:breast_set}b, with 13 fibers for detection and 19 for illumination. The carcinoma was positioned between the two rows of detector fibers in the GenIIm system.

% mesh model generation
The anisotropic MRI images (256$\times$256$\times$24) with 0.78mm step sizes for $x$ and $y$ coordinate and 3.4mm for $z$ coordinate were converted to realistic volumetric images with isotropic voxels (Fig. \ref{fig:breast_set}c). Based on the smoothed surface extracted from this volumetric data, a mesh model was generated (Fig. \ref{fig:breast_set}d) for the $z > 6cm$ region using the publicly available iso2mesh package \cite{Fang2009a}. The final mesh with 26,078 elements and 4,532 nodes was then rotated by $x$ axis for the simulation.
\begin{figure}[htbp]
 \centering
 \includegraphics[width=\textwidth]{breast-1}
 \caption{\label{fig:breast_set}(a)Sagittal and (b)frontal Gd-enhanced MRI view of an invasive ductal carcinoma, 10 minutes post injection; black circles are optical detector fibers and red circles source fibers. The cyan circle and the arrow indicate the tumor location. (c) Same data as in (a) and (b), 3D volumetric view. (d) Mesh generated from the smoothed boundary of MRI data, rotated and cropped problematic part.}
\end{figure}

%simulation set up, reconstruction
The bulk optical properties were obtained from data fitting of the measurements detected by time-domain PMTs connected to optical fibers prior to the contrast agent injection. The fitted result ($\mu_a = $ 0.0284cm$^{-1}$, $\mu_s' = 12.04$cm$^{-1}$) was employed in the simulations for generating the node-based Jacobians. Based on the discussions in Chapter \ref{chp:comp}, 10$^7$ photons (required for stable CW reconstructions) were launched for each source or detector on 4,096 single-thread CPUs at CCNI. The entire dataset took 6 minutes to complete ($13\times19=247$ source-detector pairs). Only part of the source-detector pairs were used in reconstruction, selected based on SNR and mismatch between the simulated and actual measurements from the baseline images. The reconstruction was conducted using the measurement at the maximum ICG uptake frame without any spatial \textit{a priori} information applied. A full differential approach in which the global uptake of the breast was taken into account by dividing the measurements on the post- and pre-ICG breast \cite{Intes2003, Ntziachristos1999}.

%%result
%datatype Rytov (meas/bkg)
%reconstruction
The reconstructed ICG uptake map is shown in Fig. \ref{fig:breast_result}, where the carcinoma can be observed with accordance to the Gd-DTPA location in the MR images. The positioning of the carcinoma in the frontal view has a slight offset, potentially resulted from the boundary mismatch of the measurements.
\begin{figure}[htbp]
 \centering
 \includegraphics{breast-2}
 \caption{\label{fig:breast_result}(a)Sagittal and (b)frontal reconstruction images for the subject with carcinoma at the maximum uptake frame.}
\end{figure}

This mammography showcase reveals that mMC is not confined within preclinical usages, but also applicable for a wide range of clinical applications, able to achieve great accuracy and computational efficiency simultaneously. Although the calculation time (12s per source/detector pair) is longer than simulations on mouse models using $10^7$ photons (7s per source/detector) due to the thicker volume to probe, this calculation is highly efficient. For the specific optical mammography case presented herein, a full computation was performed in 6 minutes, indicating the potential to provide 3D capability in near real-time to the physician.

%\section{Optimizing the mesh in tomography}
%\subsection{Mesh coarsening based on statistics}
%\subsection{Adapted solution field}
%\subsection{Mesh adaptation by reconstructed maps}

\section{Discussion}
In this study, we successfully implemented the widefield illumination for calculating the propagation of light through arbitrary 3D media in mesh-based models. After implementing an optimized ray-test step for determining the injecting point for each photon, computational time for widefiled illumination simulations becomes only marginally longer (5-10\%) than point-source simulations. This is a remarkable improvement compared to the 50 times increase when no optimization was employed in the widefield illumination.

Moreover, the parabolization of the code is a significant advancement of the computation capacity. The aim of developing a mixed mode MPI / OpenMP code is to utilize the full computation availability of single symmetric multiprocessors (SMPs) or a cluster of SMPs, and achieve a performance improvement over a pure MPI code. As expected, when the number of threads/processes is the same, the most efficient parallelization implementation is using pure multi-threading (OpenMP). Although the number of threads on a single SMP is restricted at current time (usually less than 8), larger systems of single SMP could become available in the foreseeable future with 64-bit memory addressing. The efficiency comparison of hybrid code with the pure MPI implementation reveals that a performance improvement is achieved: the overall computational time has reduced and the scaling of the code with increasing thread number is better. Hence a combination of MPI and OpenMP parallelization paradigms provides more efficient parallelization strategy. In addition, the hybrid code makes full use of the memory on each node due to the shared memory mechanism, allowing for the possibility of running larger-scale programs. However, the improvement in time is quite small since the pure MPI parallelization algorithm for forward MC method is optimized and scales very well already, with no significant impairment due to load imbalance or memory limitation. Overall, the three parallelization implementations have excellent scalability with almost linear acceleration when parallelized platform is available. Thus when more threads/processes are utilized, the faster the simulation will be.

%% accuracy in wide field and time domain
The widefield implementation enables one to perform simulations with spatial variation in illumination intensity, which is essential for the emerging pattern light tomography applications \cite{Joshi2006, dutta2010,DAndrea2010}. Compared to the conventional punctual illumination, the widefield light illumination results in more errors on the boundaries where the model mismatch exists due to the larger illumination area, in both the forward propagation and Jacobians. When the time-gated information is considered, the RMS difference comparison represents a significant improvement in accuracy for mMC at early gates, yet with an acceleration of $\sim$5x compared to vMC. This is of importance as the early gates are not modeled accurately by diffusion equation while the incentive of using early gates to increase spatial resolution in FMT is rising. However, this accuracy improvement in Jacobian does not necessarily relates to the reconstruction performance (13.2\% discrepancy in the RMS differences for Jacobian at the selected time gate, 3.4\% difference in the maximum reconstructed value comparing the mMC and vMC reconstruction). This can be explained by the fact that the disparity mainly exists around the boundaries while the expected fluorescent objects are closer to the center.

%% limitations
One of the limitations of the current implementation is that it does not model the time difference when photons from the source plane project to the boundary. However, according to the light propagation velocity in air, the maximum error in time for the typical specimen thickness (around 2cm) in preclinical applications is $\scriptsize{\sim}$3ps, which is negligible compared to the minimum time interval of 20ps for the time-resolved platform. Another potential difficulty for \emph{in vivo} assessment may resides in the availability of anatomical atlas, as high resolution imaging modalities are required to obtain accurate surfaces for mesh-based reconstructions.

%% mesh optimization availabilty
As a few examples, this code can be used for investigating the fluorophore distribution with time-gated data type while serving as the forward solver for MRI guided FMT. In this work, the mesh model for the forward and reconstruction processes is confined in homogeneous tissue and uniformly tessellated; however, it is noteworthy that it can also present in highly heterogeneous medium with arbitrary internal/external boundaries while retaining a small number of nodes and elements, thanks to its flexibility. The representation of mesh-based models for high resolution 3D anatomic images can be further optimized (eg. reducing the nodes while keeping the geometry unchanged) to benefit the accuracy of the reconstructions.

%% voxel->node mesh, memory taken
The rationale of similar results using node and element based weight matrix is that these methods render essentially the same spatial profiles of photon propagation, though the number of nodes for both cases are much less than that of tetrahedron elements (15,581 nodes compared to 92,713 elements). Furthermore, in the ill-posed inverse problem, the effective information mainly relies on the measurements but not the image-space spatial distribution when the number of measurements (dependent on the number of SD pairs and time-gates selected) is much less than that of the unknowns. Casting nodes as the unknowns in the inverse problem leads to a much smaller unsolved linear system \cite{Yalavarthy2008}, which is superior as the memory constraint is usually a burden in reconstructions, especially when the time-resolved data are employed to provide extra information. Due to this advantage, employing nodes instead of elements also allows for the use of denser source-detector pairs, which is crucial for improving resolution of the reconstructed objects.

%% DOT, thick tissue imaging
In addition, the formulation for diffuse optical tomography (DOT) using the MC method is readily available for the adjoint method, thus the mMC method can be easily applied to DOT to quantitatively determine the optical properties, or functional parameters (blood volume and hemoglobin oxygenation) related to optical properties as well. The example for imaging ICG uptake in breast shows the potential of using mMC for DOT within an efficient time frame in various applications. Moreover, as FMT/DOT is becoming frequently used in combination with high-resolution imaging methods such as MRI, CT, X-Ray and ultrasound \cite{Azar2008}, mMC can serve as an effective tool to incorporate high-resolution structural information into FMT/DOT with limited spatial resolution for physiological research.

\section{Summary}
This chapter focuses on assessing the feasibility and evaluating the computational performance of the widefield mMC algorithm for time-resolved FMT, and the improvement of representing models with complex boundaries. An optimized software platform was developed utilizing multi-threading and distributed parallel computing to achieve efficient calculation. We demonstrated that the mMC method is a computationally efficient solution for tomographic use when modeling of general media is required (about 5 times faster than vMC). Simulations on a complex digital mouse phantom established that both mMC and vMC match well with the high resolution vMC output in temporal and spatial profiles for punctual SD settings; however, for widefield illumination, the accuracy of mMC is improved compared to that of vMC. The experimental results show that mMC also has the potential for reduced memory utilization. The current version of this code is available for download as MMC v1.0 at http://mcx.sourceforge.net/mmc/, and more features may be added, such as a mesh-based way to represent illumination patterns.

